{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Developing, Training, and Deploying a TensorFlow model on Google Cloud Platform (completely within Jupyter)\n",
    "\n",
    "\n",
    "In Chapter 9 of [Data Science on the Google Cloud Platform](http://shop.oreilly.com/product/0636920057628.do), I trained a TensorFlow Estimator model to predict flight delays.\n",
    "\n",
    "In this notebook, we'll modernize the workflow:\n",
    "* Use eager mode for TensorFlow development\n",
    "* Use tf.data to write the input pipeline\n",
    "* Run the notebook as-is on Cloud using Deep Learning VM or Kubeflow pipelines\n",
    "* Deploy the trained model to Cloud ML Engine as a web service\n",
    "\n",
    "The combination of eager mode, tf.data and DLVM/KFP makes this workflow a lot easier.\n",
    "We don't need to deal with Python packages or Docker containers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "# change these to try this notebook out\n",
    "# In \"production\", these will be replaced by the parameters passed to papermill\n",
    "BUCKET = 'cloud-training-demos-ml'\n",
    "PROJECT = 'cloud-training-demos'\n",
    "REGION = 'us-central1'\n",
    "DEVELOP_MODE = True\n",
    "EAGER_MODE = False\n",
    "NBUCKETS = 5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.environ['BUCKET'] = BUCKET\n",
    "os.environ['PROJECT'] = PROJECT\n",
    "os.environ['REGION'] = REGION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Updated property [core/project].\n",
      "Updated property [compute/region].\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "gcloud config set project $PROJECT\n",
    "gcloud config set compute/region $REGION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating the input data pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "DATA_BUCKET = \"gs://cloud-training-demos/flights/chapter8/output/\"\n",
    "TRAIN_DATA_PATTERN = DATA_BUCKET + \"train*\"\n",
    "VALID_DATA_PATTERN = DATA_BUCKET + \"test*\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "gs://cloud-training-demos/flights/chapter8/output/delays.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/testFlights-00000-of-00007.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/testFlights-00001-of-00007.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/testFlights-00002-of-00007.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/testFlights-00003-of-00007.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/testFlights-00004-of-00007.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/testFlights-00005-of-00007.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/testFlights-00006-of-00007.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/trainFlights-00000-of-00007.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/trainFlights-00001-of-00007.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/trainFlights-00002-of-00007.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/trainFlights-00003-of-00007.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/trainFlights-00004-of-00007.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/trainFlights-00005-of-00007.csv\n",
      "gs://cloud-training-demos/flights/chapter8/output/trainFlights-00006-of-00007.csv\n"
     ]
    }
   ],
   "source": [
    "!gcloud storage ls $DATA_BUCKET"   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use tf.data to read the CSV files\n",
    "\n",
    "Note the use of Eager mode to develop the code.  I turn off eager mode once I get to the next section"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tensorflow version 1.10.1\n"
     ]
    }
   ],
   "source": [
    "import os, json, math\n",
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "print(\"Tensorflow version \" + tf.__version__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "if DEVELOP_MODE and EAGER_MODE:\n",
    "    print(\"Enabling Eager mode for development only\")\n",
    "    tf.enable_eager_execution()\n",
    "else:\n",
    "    EAGER_MODE = False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "CSV_COLUMNS  = ('ontime,dep_delay,taxiout,distance,avg_dep_delay,avg_arr_delay' + \\\n",
    "                ',carrier,dep_lat,dep_lon,arr_lat,arr_lon,origin,dest').split(',')\n",
    "LABEL_COLUMN = 'ontime'\n",
    "DEFAULTS     = [[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],\\\n",
    "                ['na'],[0.0],[0.0],[0.0],[0.0],['na'],['na']]\n",
    "\n",
    "def decode_csv(line):\n",
    "  column_values = tf.decode_csv(line, DEFAULTS)\n",
    "  column_names = CSV_COLUMNS\n",
    "  decoded_line = dict(zip(column_names, column_values)) # create a dictionary {'column_name': value, ...} for each line  \n",
    "  return decoded_line\n",
    "\n",
    "def load_dataset(pattern):\n",
    "  filenames = tf.data.Dataset.list_files(pattern)\n",
    "  dataset = filenames.interleave(tf.data.TextLineDataset, cycle_length=16)\n",
    "  dataset = dataset.map(decode_csv)\n",
    "  return dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "if EAGER_MODE:\n",
    "    dataset = load_dataset(TRAIN_DATA_PATTERN)\n",
    "    for n, data in enumerate(dataset):\n",
    "      numpy_data = {k: v.numpy() for k, v in data.items()} # .numpy() works only in eager mode\n",
    "      print(numpy_data)\n",
    "      if n>3: break"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting example_input.json\n"
     ]
    }
   ],
   "source": [
    "%%writefile example_input.json\n",
    "{\"dep_delay\": 14.0, \"taxiout\": 13.0, \"distance\": 319.0, \"avg_dep_delay\": 25.863039, \"avg_arr_delay\": 27.0, \"carrier\": \"WN\", \"dep_lat\": 32.84722, \"dep_lon\": -96.85167, \"arr_lat\": 31.9425, \"arr_lon\": -102.20194, \"origin\": \"DAL\", \"dest\": \"MAF\"}\n",
    "{\"dep_delay\": -9.0, \"taxiout\": 21.0, \"distance\": 301.0, \"avg_dep_delay\": 41.050808, \"avg_arr_delay\": -7.0, \"carrier\": \"EV\", \"dep_lat\": 29.984444, \"dep_lon\": -95.34139, \"arr_lat\": 27.544167, \"arr_lon\": -99.46167, \"origin\": \"IAH\", \"dest\": \"LRD\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "def features_and_labels(features):\n",
    "  label = features.pop('ontime') # this is what we will train for\n",
    "  return features, label\n",
    "  \n",
    "def prepare_dataset(pattern, batch_size, truncate=None, mode=tf.estimator.ModeKeys.TRAIN):\n",
    "  dataset = load_dataset(pattern)\n",
    "  dataset = dataset.map(features_and_labels)\n",
    "  dataset = dataset.cache()\n",
    "  if mode == tf.estimator.ModeKeys.TRAIN:\n",
    "    dataset = dataset.shuffle(1000)\n",
    "    dataset = dataset.repeat()\n",
    "  dataset = dataset.batch(batch_size)\n",
    "  dataset = dataset.prefetch(10)\n",
    "  if truncate is not None:\n",
    "    dataset = dataset.take(truncate)\n",
    "  return dataset\n",
    "\n",
    "if EAGER_MODE:\n",
    "    print(\"Calling prepare\")\n",
    "    one_item = prepare_dataset(TRAIN_DATA_PATTERN, batch_size=2, truncate=1)\n",
    "    print(list(one_item)) # should print one batch of 2 items"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create TensorFlow wide-and-deep model\n",
    "\n",
    "We'll create feature columns, and do some discretization and feature engineering.\n",
    "See the book for details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow.feature_column as fc\n",
    "\n",
    "real = {\n",
    "    colname : fc.numeric_column(colname) \\\n",
    "          for colname in \\\n",
    "            ('dep_delay,taxiout,distance,avg_dep_delay,avg_arr_delay' + \n",
    "             ',dep_lat,dep_lon,arr_lat,arr_lon').split(',')\n",
    "}\n",
    "sparse = {\n",
    "      'carrier': fc.categorical_column_with_vocabulary_list('carrier',\n",
    "                  vocabulary_list='AS,VX,F9,UA,US,WN,HA,EV,MQ,DL,OO,B6,NK,AA'.split(',')),\n",
    "      'origin' : fc.categorical_column_with_hash_bucket('origin', hash_bucket_size=1000),\n",
    "      'dest'   : fc.categorical_column_with_hash_bucket('dest', hash_bucket_size=1000)\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Feature engineering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dict_keys(['carrier', 'origin', 'dest', 'dep_loc', 'arr_loc', 'dep_arr', 'ori_dest'])\n",
      "dict_keys(['dep_delay', 'taxiout', 'distance', 'avg_dep_delay', 'avg_arr_delay', 'dep_lat', 'dep_lon', 'arr_lat', 'arr_lon', 'carrier', 'origin', 'dest', 'dep_loc', 'arr_loc', 'dep_arr', 'ori_dest'])\n"
     ]
    }
   ],
   "source": [
    "latbuckets = np.linspace(20.0, 50.0, NBUCKETS).tolist()  # USA\n",
    "lonbuckets = np.linspace(-120.0, -70.0, NBUCKETS).tolist() # USA\n",
    "disc = {}\n",
    "disc.update({\n",
    "       'd_{}'.format(key) : fc.bucketized_column(real[key], latbuckets) \\\n",
    "          for key in ['dep_lat', 'arr_lat']\n",
    "})\n",
    "disc.update({\n",
    "       'd_{}'.format(key) : fc.bucketized_column(real[key], lonbuckets) \\\n",
    "          for key in ['dep_lon', 'arr_lon']\n",
    "})\n",
    "\n",
    "# cross columns that make sense in combination\n",
    "sparse['dep_loc'] = fc.crossed_column([disc['d_dep_lat'], disc['d_dep_lon']], NBUCKETS*NBUCKETS)\n",
    "sparse['arr_loc'] = fc.crossed_column([disc['d_arr_lat'], disc['d_arr_lon']], NBUCKETS*NBUCKETS)\n",
    "sparse['dep_arr'] = fc.crossed_column([sparse['dep_loc'], sparse['arr_loc']], NBUCKETS ** 4)\n",
    "sparse['ori_dest'] = fc.crossed_column(['origin', 'dest'], hash_bucket_size=1000)\n",
    "\n",
    "# embed all the sparse columns\n",
    "embed = {\n",
    "       colname : fc.embedding_column(col, 10) \\\n",
    "          for colname, col in sparse.items()\n",
    "}\n",
    "real.update(embed)\n",
    "\n",
    "if DEVELOP_MODE:\n",
    "    print(sparse.keys())\n",
    "    print(real.keys())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Serving\n",
    "\n",
    "This serving input function is how the model will be deployed for prediction. We require these fields for prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "def serving_input_fn():\n",
    "    feature_placeholders = {\n",
    "        # All the real-valued columns\n",
    "        column: tf.placeholder(tf.float32, [None]) \\\n",
    "             for column in ('dep_delay,taxiout,distance,avg_dep_delay,avg_arr_delay' + \n",
    "                            ',dep_lat,dep_lon,arr_lat,arr_lon').split(',')\n",
    "    }\n",
    "    feature_placeholders.update({\n",
    "        column: tf.placeholder(tf.string, [None]) for column in ['carrier', 'origin', 'dest']\n",
    "    })\n",
    "    features = feature_placeholders # no transformations\n",
    "    return tf.estimator.export.ServingInputReceiver(features, feature_placeholders)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train the model and evaluate once in a while\n",
    "\n",
    "Also checkpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing trained model to gs://cloud-training-demos-ml/flights/trained_model\n"
     ]
    }
   ],
   "source": [
    "model_dir='gs://{}/flights/trained_model'.format(BUCKET)\n",
    "os.environ['OUTDIR'] = model_dir  # needed for deployment\n",
    "print('Writing trained model to {}'.format(model_dir))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CommandException: 1 files/objects could not be removed.\n"
     ]
    }
   ],
   "source": [
    "!gcloud storage rm --recursive --continue-on-error $OUTDIR"   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "INFO:tensorflow:Using default config.\n",
      "INFO:tensorflow:Using config: {'_model_dir': 'gs://cloud-training-demos-ml/flights/trained_model', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f55912ec3c8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}\n",
      "INFO:tensorflow:Running training and evaluation locally (non-distributed).\n",
      "INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.\n",
      "INFO:tensorflow:Calling model_fn.\n",
      "INFO:tensorflow:Done calling model_fn.\n",
      "INFO:tensorflow:Create CheckpointSaverHook.\n",
      "INFO:tensorflow:Graph was finalized.\n",
      "INFO:tensorflow:Running local_init_op.\n",
      "INFO:tensorflow:Done running local_init_op.\n",
      "INFO:tensorflow:Saving checkpoints for 0 into gs://cloud-training-demos-ml/flights/trained_model/model.ckpt.\n",
      "INFO:tensorflow:loss = 5011.7, step = 1\n",
      "INFO:tensorflow:Saving checkpoints for 10 into gs://cloud-training-demos-ml/flights/trained_model/model.ckpt.\n",
      "INFO:tensorflow:Calling model_fn.\n",
      "WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to \"careful_interpolation\" instead.\n",
      "WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to \"careful_interpolation\" instead.\n",
      "INFO:tensorflow:Done calling model_fn.\n",
      "INFO:tensorflow:Starting evaluation at 2018-11-30-04:51:33\n",
      "INFO:tensorflow:Graph was finalized.\n",
      "INFO:tensorflow:Restoring parameters from gs://cloud-training-demos-ml/flights/trained_model/model.ckpt-10\n",
      "INFO:tensorflow:Running local_init_op.\n",
      "INFO:tensorflow:Done running local_init_op.\n",
      "INFO:tensorflow:Evaluation [1/10]\n",
      "INFO:tensorflow:Evaluation [2/10]\n",
      "INFO:tensorflow:Evaluation [3/10]\n",
      "INFO:tensorflow:Evaluation [4/10]\n",
      "INFO:tensorflow:Evaluation [5/10]\n",
      "INFO:tensorflow:Evaluation [6/10]\n",
      "INFO:tensorflow:Evaluation [7/10]\n",
      "INFO:tensorflow:Evaluation [8/10]\n",
      "INFO:tensorflow:Evaluation [9/10]\n",
      "INFO:tensorflow:Evaluation [10/10]\n",
      "INFO:tensorflow:Finished evaluation at 2018-11-30-04:51:38\n",
      "INFO:tensorflow:Saving dict for global step 10: accuracy = 0.234, accuracy_baseline = 0.766, auc = 0.5, auc_precision_recall = 0.883, average_loss = 41.49751, global_step = 10, label/mean = 0.766, loss = 4149.751, precision = 0.0, prediction/mean = 7.736658e-13, recall = 0.0\n",
      "INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10: gs://cloud-training-demos-ml/flights/trained_model/model.ckpt-10\n",
      "INFO:tensorflow:Calling model_fn.\n",
      "INFO:tensorflow:Done calling model_fn.\n",
      "INFO:tensorflow:Signatures INCLUDED in export for Classify: None\n",
      "INFO:tensorflow:Signatures INCLUDED in export for Regress: None\n",
      "INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']\n",
      "INFO:tensorflow:Signatures INCLUDED in export for Train: None\n",
      "INFO:tensorflow:Signatures INCLUDED in export for Eval: None\n",
      "INFO:tensorflow:Signatures EXCLUDED from export because they cannot be be served via TensorFlow Serving APIs:\n",
      "INFO:tensorflow:'serving_default' : Classification input must be a single string Tensor; got {'dep_delay': <tf.Tensor 'Placeholder:0' shape=(?,) dtype=float32>, 'taxiout': <tf.Tensor 'Placeholder_1:0' shape=(?,) dtype=float32>, 'distance': <tf.Tensor 'Placeholder_2:0' shape=(?,) dtype=float32>, 'avg_dep_delay': <tf.Tensor 'Placeholder_3:0' shape=(?,) dtype=float32>, 'avg_arr_delay': <tf.Tensor 'Placeholder_4:0' shape=(?,) dtype=float32>, 'dep_lat': <tf.Tensor 'Placeholder_5:0' shape=(?,) dtype=float32>, 'dep_lon': <tf.Tensor 'Placeholder_6:0' shape=(?,) dtype=float32>, 'arr_lat': <tf.Tensor 'Placeholder_7:0' shape=(?,) dtype=float32>, 'arr_lon': <tf.Tensor 'Placeholder_8:0' shape=(?,) dtype=float32>, 'carrier': <tf.Tensor 'Placeholder_9:0' shape=(?,) dtype=string>, 'origin': <tf.Tensor 'Placeholder_10:0' shape=(?,) dtype=string>, 'dest': <tf.Tensor 'Placeholder_11:0' shape=(?,) dtype=string>}\n",
      "INFO:tensorflow:'classification' : Classification input must be a single string Tensor; got {'dep_delay': <tf.Tensor 'Placeholder:0' shape=(?,) dtype=float32>, 'taxiout': <tf.Tensor 'Placeholder_1:0' shape=(?,) dtype=float32>, 'distance': <tf.Tensor 'Placeholder_2:0' shape=(?,) dtype=float32>, 'avg_dep_delay': <tf.Tensor 'Placeholder_3:0' shape=(?,) dtype=float32>, 'avg_arr_delay': <tf.Tensor 'Placeholder_4:0' shape=(?,) dtype=float32>, 'dep_lat': <tf.Tensor 'Placeholder_5:0' shape=(?,) dtype=float32>, 'dep_lon': <tf.Tensor 'Placeholder_6:0' shape=(?,) dtype=float32>, 'arr_lat': <tf.Tensor 'Placeholder_7:0' shape=(?,) dtype=float32>, 'arr_lon': <tf.Tensor 'Placeholder_8:0' shape=(?,) dtype=float32>, 'carrier': <tf.Tensor 'Placeholder_9:0' shape=(?,) dtype=string>, 'origin': <tf.Tensor 'Placeholder_10:0' shape=(?,) dtype=string>, 'dest': <tf.Tensor 'Placeholder_11:0' shape=(?,) dtype=string>}\n",
      "INFO:tensorflow:'regression' : Regression input must be a single string Tensor; got {'dep_delay': <tf.Tensor 'Placeholder:0' shape=(?,) dtype=float32>, 'taxiout': <tf.Tensor 'Placeholder_1:0' shape=(?,) dtype=float32>, 'distance': <tf.Tensor 'Placeholder_2:0' shape=(?,) dtype=float32>, 'avg_dep_delay': <tf.Tensor 'Placeholder_3:0' shape=(?,) dtype=float32>, 'avg_arr_delay': <tf.Tensor 'Placeholder_4:0' shape=(?,) dtype=float32>, 'dep_lat': <tf.Tensor 'Placeholder_5:0' shape=(?,) dtype=float32>, 'dep_lon': <tf.Tensor 'Placeholder_6:0' shape=(?,) dtype=float32>, 'arr_lat': <tf.Tensor 'Placeholder_7:0' shape=(?,) dtype=float32>, 'arr_lon': <tf.Tensor 'Placeholder_8:0' shape=(?,) dtype=float32>, 'carrier': <tf.Tensor 'Placeholder_9:0' shape=(?,) dtype=string>, 'origin': <tf.Tensor 'Placeholder_10:0' shape=(?,) dtype=string>, 'dest': <tf.Tensor 'Placeholder_11:0' shape=(?,) dtype=string>}\n",
      "WARNING:tensorflow:Export includes no default signature!\n",
      "INFO:tensorflow:Restoring parameters from gs://cloud-training-demos-ml/flights/trained_model/model.ckpt-10\n",
      "INFO:tensorflow:Assets added to graph.\n",
      "INFO:tensorflow:No assets to write.\n",
      "INFO:tensorflow:SavedModel written to: gs://cloud-training-demos-ml/flights/trained_model/export/exporter/temp-b'1543553501'/saved_model.pb\n",
      "INFO:tensorflow:Loss for final step: 2781.2466.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "({'accuracy': 0.234,\n",
       "  'accuracy_baseline': 0.766,\n",
       "  'auc': 0.5,\n",
       "  'auc_precision_recall': 0.883,\n",
       "  'average_loss': 41.49751,\n",
       "  'label/mean': 0.766,\n",
       "  'loss': 4149.751,\n",
       "  'precision': 0.0,\n",
       "  'prediction/mean': 7.736658e-13,\n",
       "  'recall': 0.0,\n",
       "  'global_step': 10},\n",
       " [b'gs://cloud-training-demos-ml/flights/trained_model/export/exporter/1543553501'])"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "estimator = tf.estimator.DNNLinearCombinedClassifier(\n",
    "        model_dir = model_dir,\n",
    "        linear_feature_columns = sparse.values(),\n",
    "        dnn_feature_columns = real.values(),\n",
    "        dnn_hidden_units = [64, 32]\n",
    ")\n",
    "\n",
    "train_batch_size = 64\n",
    "train_input_fn = lambda: prepare_dataset(TRAIN_DATA_PATTERN, train_batch_size).make_one_shot_iterator().get_next()\n",
    "eval_batch_size = 100 if DEVELOP_MODE else 10000\n",
    "eval_input_fn = lambda: prepare_dataset(VALID_DATA_PATTERN, eval_batch_size, eval_batch_size*10, tf.estimator.ModeKeys.EVAL).make_one_shot_iterator().get_next()\n",
    "num_steps = 10 if DEVELOP_MODE else (1000000 // train_batch_size)\n",
    "\n",
    "train_spec = tf.estimator.TrainSpec(train_input_fn, max_steps = num_steps)\n",
    "exporter = tf.estimator.LatestExporter('exporter', serving_input_fn)\n",
    "eval_spec = tf.estimator.EvalSpec(eval_input_fn, steps=10, exporters=exporter)\n",
    "tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deploy the trained model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "gs://cloud-training-demos-ml/flights/trained_model/export/exporter/1543553501/\n",
      "\n",
      "MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:\n",
      "\n",
      "signature_def['predict']:\n",
      "  The given SavedModel SignatureDef contains the following input(s):\n",
      "    inputs['arr_lat'] tensor_info:\n",
      "        dtype: DT_FLOAT\n",
      "        shape: (-1)\n",
      "        name: Placeholder_7:0\n",
      "    inputs['arr_lon'] tensor_info:\n",
      "        dtype: DT_FLOAT\n",
      "        shape: (-1)\n",
      "        name: Placeholder_8:0\n",
      "    inputs['avg_arr_delay'] tensor_info:\n",
      "        dtype: DT_FLOAT\n",
      "        shape: (-1)\n",
      "        name: Placeholder_4:0\n",
      "    inputs['avg_dep_delay'] tensor_info:\n",
      "        dtype: DT_FLOAT\n",
      "        shape: (-1)\n",
      "        name: Placeholder_3:0\n",
      "    inputs['carrier'] tensor_info:\n",
      "        dtype: DT_STRING\n",
      "        shape: (-1)\n",
      "        name: Placeholder_9:0\n",
      "    inputs['dep_delay'] tensor_info:\n",
      "        dtype: DT_FLOAT\n",
      "        shape: (-1)\n",
      "        name: Placeholder:0\n",
      "    inputs['dep_lat'] tensor_info:\n",
      "        dtype: DT_FLOAT\n",
      "        shape: (-1)\n",
      "        name: Placeholder_5:0\n",
      "    inputs['dep_lon'] tensor_info:\n",
      "        dtype: DT_FLOAT\n",
      "        shape: (-1)\n",
      "        name: Placeholder_6:0\n",
      "    inputs['dest'] tensor_info:\n",
      "        dtype: DT_STRING\n",
      "        shape: (-1)\n",
      "        name: Placeholder_11:0\n",
      "    inputs['distance'] tensor_info:\n",
      "        dtype: DT_FLOAT\n",
      "        shape: (-1)\n",
      "        name: Placeholder_2:0\n",
      "    inputs['origin'] tensor_info:\n",
      "        dtype: DT_STRING\n",
      "        shape: (-1)\n",
      "        name: Placeholder_10:0\n",
      "    inputs['taxiout'] tensor_info:\n",
      "        dtype: DT_FLOAT\n",
      "        shape: (-1)\n",
      "        name: Placeholder_1:0\n",
      "  The given SavedModel SignatureDef contains the following output(s):\n",
      "    outputs['class_ids'] tensor_info:\n",
      "        dtype: DT_INT64\n",
      "        shape: (-1, 1)\n",
      "        name: head/predictions/ExpandDims:0\n",
      "    outputs['classes'] tensor_info:\n",
      "        dtype: DT_STRING\n",
      "        shape: (-1, 1)\n",
      "        name: head/predictions/str_classes:0\n",
      "    outputs['logistic'] tensor_info:\n",
      "        dtype: DT_FLOAT\n",
      "        shape: (-1, 1)\n",
      "        name: head/predictions/logistic:0\n",
      "    outputs['logits'] tensor_info:\n",
      "        dtype: DT_FLOAT\n",
      "        shape: (-1, 1)\n",
      "        name: add:0\n",
      "    outputs['probabilities'] tensor_info:\n",
      "        dtype: DT_FLOAT\n",
      "        shape: (-1, 2)\n",
      "        name: head/predictions/probabilities:0\n",
      "  Method name is: tensorflow/serving/predict\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "model_dir=$(gcloud storage ls ${OUTDIR}/export/exporter | tail -1)\n",    "echo $model_dir\n",
    "saved_model_cli show --dir ${model_dir} --all"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "MODEL_NAME=\"flights\"\n",
    "MODEL_VERSION=\"kfp\"\n",
    "TFVERSION=\"1.10\"\n",
    "MODEL_LOCATION=$(gcloud storage ls ${OUTDIR}/export/exporter | tail -1)\n",    "echo \"Run these commands one-by-one (the very first time, you'll create a model and then create a version)\"\n",
    "yes | gcloud ml-engine versions delete ${MODEL_VERSION} --model ${MODEL_NAME}\n",
    "#gcloud ml-engine models delete ${MODEL_NAME}\n",
    "#gcloud ml-engine models create ${MODEL_NAME} --regions $REGION\n",
    "gcloud ml-engine versions create ${MODEL_VERSION} --model ${MODEL_NAME} --origin ${MODEL_LOCATION} --runtime-version $TFVERSION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CLASS_IDS  CLASSES  LOGISTIC                  LOGITS                 PROBABILITIES\n",
      "[0]        [u'0']   [4.0148590767522937e-17]  [-37.753944396972656]  [1.0, 4.0148587458800486e-17]\n",
      "[0]        [u'0']   [9.958989877342436e-18]   [-39.14805603027344]   [1.0, 9.958989877342436e-18]\n"
     ]
    }
   ],
   "source": [
    "!gcloud ml-engine predict --model=flights --version=kfp --json-instances=example_input.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright 2016 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
